Looking for data file at: /Users/katejohnson/Documents/Other/Northeastern/CS6140/Course Project/cs6140-course-project/processed_data/final_processed_data.csv Dataset Overview: ================================================================================ Shape: (613, 8) Features: - Year: float64 (Missing: 0) - Month: float64 (Missing: 0) - Hydroelectric Power: float64 (Missing: 0) - Solar Energy: float64 (Missing: 0) - Wind Energy: float64 (Missing: 0) - Geothermal Energy: float64 (Missing: 0) - Biomass Energy: float64 (Missing: 0) - Total Renewable Energy: float64 (Missing: 0)
Current working directory: /Users/katejohnson/Documents/Other/Northeastern/CS6140/Course Project/cs6140-course-project/notebooks Trying to load datasets... Loading data from: /Users/katejohnson/Documents/Other/Northeastern/CS6140/Course Project/cs6140-course-project/data Checking global energy path: /Users/katejohnson/Documents/Other/Northeastern/CS6140/Course Project/cs6140-course-project/data/Global Energy Consumption & Renewable Generation Path exists: True Datasets loaded successfully! Loading data from: /Users/katejohnson/Documents/Other/Northeastern/CS6140/Course Project/cs6140-course-project/data Checking global energy path: /Users/katejohnson/Documents/Other/Northeastern/CS6140/Course Project/cs6140-course-project/data/Global Energy Consumption & Renewable Generation Path exists: True
Global Energy Consumption & Renewable Generation Datasets ================================================================================ Dataset: continent_consumption Shape: (31, 12) Columns: - Year: int64 (Missing: 0) - World: float64 (Missing: 0) - OECD: float64 (Missing: 0) - BRICS: float64 (Missing: 0) - Europe: float64 (Missing: 0) - North America: float64 (Missing: 0) - Latin America: float64 (Missing: 0) - Asia: float64 (Missing: 0) - Pacific: float64 (Missing: 0) - Africa: float64 (Missing: 0) - Middle-East: float64 (Missing: 0) - CIS: float64 (Missing: 0) ---------------------------------------- Dataset: country_consumption Shape: (33, 45) Columns: - Year: float64 (Missing: 2) - China: float64 (Missing: 2) - United States: float64 (Missing: 2) - Brazil: float64 (Missing: 2) - Belgium: float64 (Missing: 2) - Czechia: float64 (Missing: 2) - France: float64 (Missing: 2) - Germany: float64 (Missing: 2) - Italy: float64 (Missing: 2) - Netherlands: float64 (Missing: 2) - Poland: float64 (Missing: 2) - Portugal: float64 (Missing: 2) - Romania: float64 (Missing: 2) - Spain: float64 (Missing: 2) - Sweden: float64 (Missing: 2) - United Kingdom: float64 (Missing: 2) - Norway: float64 (Missing: 2) - Turkey: float64 (Missing: 2) - Kazakhstan: float64 (Missing: 2) - Russia: float64 (Missing: 2) - Ukraine: float64 (Missing: 2) - Uzbekistan: float64 (Missing: 2) - Argentina: float64 (Missing: 2) - Canada: float64 (Missing: 2) - Chile: float64 (Missing: 2) - Colombia: float64 (Missing: 2) - Mexico: float64 (Missing: 2) - Venezuela: float64 (Missing: 2) - Indonesia: float64 (Missing: 2) - Japan: float64 (Missing: 2) - Malaysia: float64 (Missing: 2) - South Korea: float64 (Missing: 2) - Taiwan: float64 (Missing: 2) - Thailand: float64 (Missing: 2) - India: float64 (Missing: 2) - Australia: float64 (Missing: 2) - New Zealand: float64 (Missing: 2) - Algeria: float64 (Missing: 2) - Egypt: float64 (Missing: 2) - Nigeria: float64 (Missing: 2) - South Africa: float64 (Missing: 2) - Iran: float64 (Missing: 2) - Kuwait: float64 (Missing: 2) - Saudi Arabia: float64 (Missing: 2) - United Arab Emirates: float64 (Missing: 2) ---------------------------------------- Dataset: renewable_gen Shape: (28, 5) Columns: - Year: int64 (Missing: 0) - Hydro(TWh): float64 (Missing: 0) - Biofuel(TWh): float64 (Missing: 0) - Solar PV (TWh): float64 (Missing: 0) - Geothermal (TWh): float64 (Missing: 0) ---------------------------------------- Dataset: nonrenewable_gen Shape: (8, 2) Columns: - Mode of Generation: object (Missing: 0) - Contribution (TWh): float64 (Missing: 0) ---------------------------------------- Worldwide Renewable Energy Datasets ================================================================================ Dataset: renewable_share Shape: (5603, 4) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1311) - Year: int64 (Missing: 0) - Renewables (% equivalent primary energy): float64 (Missing: 0) ---------------------------------------- Dataset: renewable_consumption Shape: (5610, 7) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1311) - Year: int64 (Missing: 0) - Geo Biomass Other - TWh: float64 (Missing: 144) - Solar Generation - TWh: float64 (Missing: 168) - Wind Generation - TWh: float64 (Missing: 165) - Hydro Generation - TWh: float64 (Missing: 7) ---------------------------------------- Dataset: hydro_consumption Shape: (8840, 4) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1555) - Year: int64 (Missing: 0) - Electricity from hydro (TWh): float64 (Missing: 0) ---------------------------------------- Dataset: wind_generation Shape: (8676, 4) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1459) - Year: int64 (Missing: 0) - Electricity from wind (TWh): float64 (Missing: 0) ---------------------------------------- Dataset: solar_consumption Shape: (8683, 4) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1456) - Year: int64 (Missing: 0) - Electricity from solar (TWh): float64 (Missing: 0) ---------------------------------------- Weather Conditions Dataset ================================================================================ <class 'pandas.core.frame.DataFrame'> RangeIndex: 196776 entries, 0 to 196775 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Time 196776 non-null object 1 Energy delta[Wh] 196776 non-null int64 2 GHI 196776 non-null float64 3 temp 196776 non-null float64 4 pressure 196776 non-null int64 5 humidity 196776 non-null int64 6 wind_speed 196776 non-null float64 7 rain_1h 196776 non-null float64 8 snow_1h 196776 non-null float64 9 clouds_all 196776 non-null int64 10 isSun 196776 non-null int64 11 sunlightTime 196776 non-null int64 12 dayLength 196776 non-null int64 13 SunlightTime/daylength 196776 non-null float64 14 weather_type 196776 non-null int64 15 hour 196776 non-null int64 16 month 196776 non-null int64 dtypes: float64(6), int64(10), object(1) memory usage: 25.5+ MB
None
US Renewable Energy Dataset ================================================================================ <class 'pandas.core.frame.DataFrame'> RangeIndex: 3065 entries, 0 to 3064 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 3065 non-null int64 1 Month 3065 non-null int64 2 Sector 3065 non-null object 3 Hydroelectric Power 3065 non-null float64 4 Geothermal Energy 3065 non-null float64 5 Solar Energy 3065 non-null float64 6 Wind Energy 3065 non-null float64 7 Wood Energy 3065 non-null float64 8 Waste Energy 3065 non-null float64 9 Fuel Ethanol, Excluding Denaturant 3065 non-null float64 10 Biomass Losses and Co-products 3065 non-null float64 11 Biomass Energy 3065 non-null float64 12 Total Renewable Energy 3065 non-null float64 13 Renewable Diesel Fuel 3065 non-null float64 14 Other Biofuels 3065 non-null float64 15 Conventional Hydroelectric Power 3065 non-null float64 16 Biodiesel 3065 non-null float64 dtypes: float64(14), int64(2), object(1) memory usage: 407.2+ KB
None
Global Energy Consumption & Renewable Generation Datasets ================================================================================ Dataset: continent_consumption Shape: (31, 12) Columns: - Year: int64 (Missing: 0) - World: float64 (Missing: 0) - OECD: float64 (Missing: 0) - BRICS: float64 (Missing: 0) - Europe: float64 (Missing: 0) - North America: float64 (Missing: 0) - Latin America: float64 (Missing: 0) - Asia: float64 (Missing: 0) - Pacific: float64 (Missing: 0) - Africa: float64 (Missing: 0) - Middle-East: float64 (Missing: 0) - CIS: float64 (Missing: 0) ---------------------------------------- Dataset: country_consumption Shape: (33, 45) Columns: - Year: float64 (Missing: 2) - China: float64 (Missing: 2) - United States: float64 (Missing: 2) - Brazil: float64 (Missing: 2) - Belgium: float64 (Missing: 2) - Czechia: float64 (Missing: 2) - France: float64 (Missing: 2) - Germany: float64 (Missing: 2) - Italy: float64 (Missing: 2) - Netherlands: float64 (Missing: 2) - Poland: float64 (Missing: 2) - Portugal: float64 (Missing: 2) - Romania: float64 (Missing: 2) - Spain: float64 (Missing: 2) - Sweden: float64 (Missing: 2) - United Kingdom: float64 (Missing: 2) - Norway: float64 (Missing: 2) - Turkey: float64 (Missing: 2) - Kazakhstan: float64 (Missing: 2) - Russia: float64 (Missing: 2) - Ukraine: float64 (Missing: 2) - Uzbekistan: float64 (Missing: 2) - Argentina: float64 (Missing: 2) - Canada: float64 (Missing: 2) - Chile: float64 (Missing: 2) - Colombia: float64 (Missing: 2) - Mexico: float64 (Missing: 2) - Venezuela: float64 (Missing: 2) - Indonesia: float64 (Missing: 2) - Japan: float64 (Missing: 2) - Malaysia: float64 (Missing: 2) - South Korea: float64 (Missing: 2) - Taiwan: float64 (Missing: 2) - Thailand: float64 (Missing: 2) - India: float64 (Missing: 2) - Australia: float64 (Missing: 2) - New Zealand: float64 (Missing: 2) - Algeria: float64 (Missing: 2) - Egypt: float64 (Missing: 2) - Nigeria: float64 (Missing: 2) - South Africa: float64 (Missing: 2) - Iran: float64 (Missing: 2) - Kuwait: float64 (Missing: 2) - Saudi Arabia: float64 (Missing: 2) - United Arab Emirates: float64 (Missing: 2) ---------------------------------------- Dataset: renewable_gen Shape: (28, 5) Columns: - Year: int64 (Missing: 0) - Hydro(TWh): float64 (Missing: 0) - Biofuel(TWh): float64 (Missing: 0) - Solar PV (TWh): float64 (Missing: 0) - Geothermal (TWh): float64 (Missing: 0) ---------------------------------------- Dataset: nonrenewable_gen Shape: (8, 2) Columns: - Mode of Generation: object (Missing: 0) - Contribution (TWh): float64 (Missing: 0) ---------------------------------------- Worldwide Renewable Energy Datasets ================================================================================ Dataset: renewable_share Shape: (5603, 4) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1311) - Year: int64 (Missing: 0) - Renewables (% equivalent primary energy): float64 (Missing: 0) ---------------------------------------- Dataset: renewable_consumption Shape: (5610, 7) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1311) - Year: int64 (Missing: 0) - Geo Biomass Other - TWh: float64 (Missing: 144) - Solar Generation - TWh: float64 (Missing: 168) - Wind Generation - TWh: float64 (Missing: 165) - Hydro Generation - TWh: float64 (Missing: 7) ---------------------------------------- Dataset: hydro_consumption Shape: (8840, 4) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1555) - Year: int64 (Missing: 0) - Electricity from hydro (TWh): float64 (Missing: 0) ---------------------------------------- Dataset: wind_generation Shape: (8676, 4) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1459) - Year: int64 (Missing: 0) - Electricity from wind (TWh): float64 (Missing: 0) ---------------------------------------- Dataset: solar_consumption Shape: (8683, 4) Columns: - Entity: object (Missing: 0) - Code: object (Missing: 1456) - Year: int64 (Missing: 0) - Electricity from solar (TWh): float64 (Missing: 0) ---------------------------------------- Weather Conditions Dataset ================================================================================ <class 'pandas.core.frame.DataFrame'> RangeIndex: 196776 entries, 0 to 196775 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Time 196776 non-null object 1 Energy delta[Wh] 196776 non-null int64 2 GHI 196776 non-null float64 3 temp 196776 non-null float64 4 pressure 196776 non-null int64 5 humidity 196776 non-null int64 6 wind_speed 196776 non-null float64 7 rain_1h 196776 non-null float64 8 snow_1h 196776 non-null float64 9 clouds_all 196776 non-null int64 10 isSun 196776 non-null int64 11 sunlightTime 196776 non-null int64 12 dayLength 196776 non-null int64 13 SunlightTime/daylength 196776 non-null float64 14 weather_type 196776 non-null int64 15 hour 196776 non-null int64 16 month 196776 non-null int64 dtypes: float64(6), int64(10), object(1) memory usage: 25.5+ MB
None
US Renewable Energy Dataset ================================================================================ <class 'pandas.core.frame.DataFrame'> RangeIndex: 3065 entries, 0 to 3064 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 3065 non-null int64 1 Month 3065 non-null int64 2 Sector 3065 non-null object 3 Hydroelectric Power 3065 non-null float64 4 Geothermal Energy 3065 non-null float64 5 Solar Energy 3065 non-null float64 6 Wind Energy 3065 non-null float64 7 Wood Energy 3065 non-null float64 8 Waste Energy 3065 non-null float64 9 Fuel Ethanol, Excluding Denaturant 3065 non-null float64 10 Biomass Losses and Co-products 3065 non-null float64 11 Biomass Energy 3065 non-null float64 12 Total Renewable Energy 3065 non-null float64 13 Renewable Diesel Fuel 3065 non-null float64 14 Other Biofuels 3065 non-null float64 15 Conventional Hydroelectric Power 3065 non-null float64 16 Biodiesel 3065 non-null float64 dtypes: float64(14), int64(2), object(1) memory usage: 407.2+ KB
None
Global Energy Data Quality Assessment
================================================================================
Dataset: continent_consumption
Duplicate Rows: 0
Numerical Columns Statistics:
Year World OECD BRICS Europe North America \
count 31.00 31.00 31.00 31.00 31.00 31.00
mean 2005.00 132792.47 60396.47 41128.93 21487.74 28226.76
std 9.09 22724.12 3480.62 13849.97 899.17 1548.24
min 1990.00 101855.54 52602.49 25993.05 19643.07 24667.23
25% 1997.50 111176.98 58719.87 27504.95 20875.85 27435.17
50% 2005.00 133582.18 61545.96 38169.66 21480.61 28598.17
75% 2012.50 154853.45 62360.06 55521.62 21951.62 29295.97
max 2020.00 167553.41 64883.77 63255.57 23108.81 30424.08
Latin America Asia Pacific Africa Middle-East CIS
count 31.00 31.00 31.00 31.00 31.00 31.00
mean 7897.15 45402.02 1563.30 6851.95 5984.20 11823.96
std 1537.72 15511.85 205.51 1742.66 2245.55 1410.09
min 5373.06 24574.19 1186.26 4407.77 2581.86 10152.99
25% 6687.25 31383.56 1424.68 5355.62 4070.50 11001.98
50% 8059.59 43693.91 1570.05 6652.36 5675.44 11606.74
75% 9391.22 60760.94 1756.13 8367.78 8007.26 12083.57
max 9978.54 69582.29 1802.65 9641.27 9455.19 16049.40
----------------------------------------
Dataset: country_consumption
Missing Values:
Year 2
China 2
United States 2
Brazil 2
Belgium 2
Czechia 2
France 2
Germany 2
Italy 2
Netherlands 2
Poland 2
Portugal 2
Romania 2
Spain 2
Sweden 2
United Kingdom 2
Norway 2
Turkey 2
Kazakhstan 2
Russia 2
Ukraine 2
Uzbekistan 2
Argentina 2
Canada 2
Chile 2
Colombia 2
Mexico 2
Venezuela 2
Indonesia 2
Japan 2
Malaysia 2
South Korea 2
Taiwan 2
Thailand 2
India 2
Australia 2
New Zealand 2
Algeria 2
Egypt 2
Nigeria 2
South Africa 2
Iran 2
Kuwait 2
Saudi Arabia 2
United Arab Emirates 2
dtype: int64
Duplicate Rows: 1
Numerical Columns Statistics:
Year China United States Brazil Belgium Czechia France \
count 31.00 31.00 31.00 31.00 31.00 31.00 31.00
mean 2005.00 1923.32 2167.45 223.45 54.90 43.26 251.19
std 9.09 898.86 114.08 55.46 3.03 2.19 13.64
min 1990.00 848.00 1910.00 141.00 48.00 39.00 217.00
25% 1997.50 1076.50 2119.00 181.00 53.00 42.00 243.50
50% 2005.00 1782.00 2191.00 216.00 56.00 43.00 252.00
75% 2012.50 2866.50 2246.00 284.00 57.00 45.00 260.50
max 2020.00 3381.00 2338.00 303.00 60.00 50.00 273.00
Germany Italy Netherlands ... Australia New Zealand Algeria \
count 31.0 31.00 31.00 ... 31.00 31.00 31.00
mean 327.9 162.90 74.87 ... 112.65 17.61 37.26
std 18.4 14.02 3.98 ... 14.99 2.25 13.75
min 275.0 137.00 67.00 ... 85.00 14.00 22.00
25% 313.0 150.50 72.00 ... 102.50 16.00 24.50
50% 335.0 162.00 75.00 ... 113.00 17.00 32.00
75% 340.0 173.00 77.50 ... 126.50 19.00 48.00
max 351.0 187.00 83.00 ... 129.00 21.00 65.00
Egypt Nigeria South Africa Iran Kuwait Saudi Arabia \
count 31.00 31.00 31.00 31.00 31.00 31.00
mean 60.94 108.97 118.19 169.06 23.16 138.39
std 21.91 31.86 16.72 64.86 9.04 53.97
min 33.00 66.00 88.00 69.00 3.00 58.00
25% 40.50 79.50 106.00 110.00 16.00 91.00
50% 62.00 105.00 120.00 173.00 25.00 123.00
75% 78.50 141.50 132.50 220.00 29.00 188.50
max 97.00 160.00 144.00 269.00 38.00 219.00
United Arab Emirates
count 31.00
mean 49.06
std 20.97
min 20.00
25% 31.00
50% 44.00
75% 66.00
max 83.00
[8 rows x 45 columns]
----------------------------------------
Dataset: renewable_gen
Duplicate Rows: 0
Numerical Columns Statistics:
Year Hydro(TWh) Biofuel(TWh) Solar PV (TWh) Geothermal (TWh)
count 28.00 28.00 28.00 28.00 28.00
mean 2003.50 2974.17 245.03 57.43 57.01
std 8.23 595.94 329.28 113.34 14.85
min 1990.00 2191.67 3.88 0.09 36.42
25% 1996.75 2598.63 11.42 0.26 42.33
50% 2003.50 2718.72 74.33 2.34 55.30
75% 2010.25 3298.90 365.04 40.10 68.40
max 2017.00 4197.29 1127.31 443.55 85.34
----------------------------------------
Dataset: nonrenewable_gen
Duplicate Rows: 0
Numerical Columns Statistics:
Contribution (TWh)
count 8.00
mean 4862.04
std 6852.38
min 36.02
25% 104.04
50% 1738.95
75% 6877.95
max 19448.16
----------------------------------------
Worldwide Renewable Data Quality Assessment
================================================================================
Dataset: renewable_share
Missing Values:
Code 1311
dtype: int64
Duplicate Rows: 0
Numerical Columns Statistics:
Year Renewables (% equivalent primary energy)
count 5603.00 5603.00
mean 1993.80 10.74
std 16.28 12.92
min 1965.00 0.00
25% 1980.00 1.98
50% 1994.00 6.52
75% 2008.00 14.10
max 2021.00 86.87
----------------------------------------
Dataset: renewable_consumption
Missing Values:
Code 1311
Geo Biomass Other - TWh 144
Solar Generation - TWh 168
Wind Generation - TWh 165
Hydro Generation - TWh 7
dtype: int64
Duplicate Rows: 0
Numerical Columns Statistics:
Year Geo Biomass Other - TWh Solar Generation - TWh \
count 5610.00 5466.00 5442.00
mean 1993.83 13.46 5.48
std 16.30 47.64 39.90
min 1965.00 0.00 0.00
25% 1980.00 0.00 0.00
50% 1994.00 0.23 0.00
75% 2008.00 4.27 0.02
max 2021.00 762.78 1032.50
Wind Generation - TWh Hydro Generation - TWh
count 5445.00 5603.00
mean 15.03 147.89
std 84.73 390.19
min 0.00 0.00
25% 0.00 1.37
50% 0.00 10.69
75% 0.28 65.84
max 1861.94 4345.99
----------------------------------------
Dataset: hydro_consumption
Missing Values:
Code 1555
dtype: int64
Duplicate Rows: 0
Numerical Columns Statistics:
Year Electricity from hydro (TWh)
count 8840.00 8840.00
mean 1999.89 116.58
std 15.75 360.23
min 1965.00 0.00
25% 1988.00 0.09
50% 2004.00 3.53
75% 2013.00 30.07
max 2022.00 4340.61
----------------------------------------
Dataset: wind_generation
Missing Values:
Code 1459
dtype: int64
Duplicate Rows: 0
Numerical Columns Statistics:
Year Electricity from wind (TWh)
count 8676.00 8676.00
mean 2000.34 14.57
std 15.51 86.39
min 1965.00 0.00
25% 1990.00 0.00
50% 2004.00 0.00
75% 2013.00 0.06
max 2022.00 1848.26
----------------------------------------
Dataset: solar_consumption
Missing Values:
Code 1456
dtype: int64
Duplicate Rows: 0
Numerical Columns Statistics:
Year Electricity from solar (TWh)
count 8683.00 8683.00
mean 2000.38 5.28
std 15.50 40.10
min 1965.00 0.00
25% 1990.00 0.00
50% 2004.00 0.00
75% 2013.00 0.01
max 2022.00 1040.50
----------------------------------------
Weather Data Quality Assessment
================================================================================
| Energy delta[Wh] | GHI | temp | pressure | humidity | wind_speed | rain_1h | snow_1h | clouds_all | isSun | sunlightTime | dayLength | SunlightTime/daylength | weather_type | hour | month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 | 196776.000000 |
| mean | 573.008228 | 32.596538 | 9.790521 | 1015.292780 | 79.810566 | 3.937746 | 0.066035 | 0.007148 | 65.974387 | 0.519962 | 211.721094 | 748.644347 | 0.265187 | 3.198398 | 11.498902 | 6.298329 |
| std | 1044.824047 | 52.172018 | 7.995428 | 9.585773 | 15.604459 | 1.821694 | 0.278913 | 0.069710 | 36.628593 | 0.499603 | 273.902186 | 194.870208 | 0.329023 | 1.289939 | 6.921887 | 3.376066 |
| min | 0.000000 | 0.000000 | -16.600000 | 977.000000 | 22.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 450.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 |
| 25% | 0.000000 | 0.000000 | 3.600000 | 1010.000000 | 70.000000 | 2.600000 | 0.000000 | 0.000000 | 34.000000 | 0.000000 | 0.000000 | 570.000000 | 0.000000 | 2.000000 | 5.000000 | 3.000000 |
| 50% | 0.000000 | 1.600000 | 9.300000 | 1016.000000 | 84.000000 | 3.700000 | 0.000000 | 0.000000 | 82.000000 | 1.000000 | 30.000000 | 765.000000 | 0.050000 | 4.000000 | 11.000000 | 6.000000 |
| 75% | 577.000000 | 46.800000 | 15.700000 | 1021.000000 | 92.000000 | 5.000000 | 0.000000 | 0.000000 | 100.000000 | 1.000000 | 390.000000 | 930.000000 | 0.530000 | 4.000000 | 17.000000 | 9.000000 |
| max | 5020.000000 | 229.200000 | 35.800000 | 1047.000000 | 100.000000 | 14.300000 | 8.090000 | 2.820000 | 100.000000 | 1.000000 | 1020.000000 | 1020.000000 | 1.000000 | 5.000000 | 23.000000 | 12.000000 |
US Data Quality Assessment ================================================================================
| Year | Month | Hydroelectric Power | Geothermal Energy | Solar Energy | Wind Energy | Wood Energy | Waste Energy | Fuel Ethanol, Excluding Denaturant | Biomass Losses and Co-products | Biomass Energy | Total Renewable Energy | Renewable Diesel Fuel | Other Biofuels | Conventional Hydroelectric Power | Biodiesel | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 | 3065.000000 |
| mean | 1998.042414 | 6.491028 | 0.169759 | 1.146369 | 2.015008 | 4.282404 | 36.644408 | 5.820124 | 6.976648 | 4.834706 | 46.285969 | 70.872209 | 0.428949 | 0.031752 | 15.757374 | 0.953720 |
| std | 14.747378 | 3.456934 | 0.373819 | 1.550857 | 5.774511 | 18.124793 | 46.900639 | 8.247359 | 21.911920 | 15.601717 | 64.241520 | 71.197761 | 2.687850 | 0.258149 | 32.134059 | 3.985003 |
| min | 1973.000000 | 1.000000 | -0.002000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 1985.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.483000 | 0.000000 | 0.000000 | 0.000000 | 0.258000 | 2.070000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 50% | 1998.000000 | 6.000000 | 0.000000 | 0.357000 | 0.004000 | 0.000000 | 12.062000 | 0.108000 | 0.007000 | 0.000000 | 9.716000 | 50.984000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 75% | 2011.000000 | 9.000000 | 0.036000 | 1.673000 | 0.774000 | 0.001000 | 51.808000 | 12.764000 | 1.283000 | 0.000000 | 89.359000 | 126.982000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| max | 2024.000000 | 12.000000 | 2.047000 | 5.951000 | 64.040000 | 157.409000 | 183.628000 | 32.875000 | 104.420000 | 75.373000 | 233.200000 | 308.175000 | 38.344000 | 4.101000 | 117.453000 | 27.871000 |
Global Data - Renewable Generation: Year Hydro(TWh) Biofuel(TWh) Solar PV (TWh) Geothermal (TWh) 0 1990 2191.67 3.88 0.09 36.42 1 1991 2268.63 4.19 0.10 37.39 2 1992 2267.16 4.63 0.12 39.30 3 1993 2397.67 5.61 0.15 40.23 4 1994 2419.73 7.31 0.17 41.05 Columns: ['Year', 'Hydro(TWh)', 'Biofuel(TWh)', 'Solar PV (TWh)', 'Geothermal (TWh)'] Worldwide Data - Renewable Share: Entity Code Year Renewables (% equivalent primary energy) 0 Africa NaN 1965 5.747495 1 Africa NaN 1966 6.122062 2 Africa NaN 1967 6.325731 3 Africa NaN 1968 7.005293 4 Africa NaN 1969 7.956088 Columns: ['Entity', 'Code', 'Year', 'Renewables (% equivalent primary energy)'] Plotting renewable generation trends... Available columns: ['Year', 'Hydro(TWh)', 'Biofuel(TWh)', 'Solar PV (TWh)', 'Geothermal (TWh)']
Plotting renewable share evolution... Available columns: ['Entity', 'Code', 'Year', 'Renewables (% equivalent primary energy)']
Plotting solar and wind generation trends... Available columns: ['Year', 'Hydro(TWh)', 'Biofuel(TWh)', 'Solar PV (TWh)', 'Geothermal (TWh)']
Available columns: ['Entity', 'Code', 'Year', 'Electricity from wind (TWh)']
Renewable Generation Data Info: Columns: ['Year', 'Hydro(TWh)', 'Biofuel(TWh)', 'Solar PV (TWh)', 'Geothermal (TWh)'] Sample Data: Year Hydro(TWh) Biofuel(TWh) Solar PV (TWh) Geothermal (TWh) 0 1990 2191.67 3.88 0.09 36.42 1 1991 2268.63 4.19 0.10 37.39 2 1992 2267.16 4.63 0.12 39.30 3 1993 2397.67 5.61 0.15 40.23 4 1994 2419.73 7.31 0.17 41.05 Latest year in data: 2017
Visualization Summary: - Data covers years from 1990 to 2017 - Total types of renewable energy tracked: 4 - Energy types: ['Hydro(TWh)', 'Biofuel(TWh)', 'Solar PV (TWh)', 'Geothermal (TWh)']
Starting weather impact analysis... Weather Data Info: Columns: ['Time', 'Energy delta[Wh]', 'GHI', 'temp', 'pressure', 'humidity', 'wind_speed', 'rain_1h', 'snow_1h', 'clouds_all', 'isSun', 'sunlightTime', 'dayLength', 'SunlightTime/daylength', 'weather_type', 'hour', 'month'] Data Types: Time object Energy delta[Wh] int64 GHI float64 temp float64 pressure int64 humidity int64 wind_speed float64 rain_1h float64 snow_1h float64 clouds_all int64 isSun int64 sunlightTime int64 dayLength int64 SunlightTime/daylength float64 weather_type int64 hour int64 month int64 dtype: object Numeric columns for analysis: ['Energy delta[Wh]', 'GHI', 'temp', 'pressure', 'humidity', 'wind_speed', 'rain_1h', 'snow_1h', 'clouds_all', 'isSun', 'sunlightTime', 'dayLength', 'SunlightTime/daylength', 'weather_type', 'hour', 'month']
Creating scatter matrix for variables: ['temp', 'wind_speed', 'GHI', 'Energy delta[Wh]']
Summary Statistics:
temp wind_speed GHI Energy delta[Wh]
count 196776.000000 196776.000000 196776.000000 196776.000000
mean 9.790521 3.937746 32.596538 573.008228
std 7.995428 1.821694 52.172018 1044.824047
min -16.600000 0.000000 0.000000 0.000000
25% 3.600000 2.600000 0.000000 0.000000
50% 9.300000 3.700000 1.600000 0.000000
75% 15.700000 5.000000 46.800000 577.000000
max 35.800000 14.300000 229.200000 5020.000000
Key Findings:
Correlation between temp and wind_speed: -0.08
Correlation between GHI and temp: 0.49
Correlation between GHI and wind_speed: 0.02
Correlation between Energy delta[Wh] and temp: 0.38
Correlation between Energy delta[Wh] and wind_speed: 0.03
Correlation between Energy delta[Wh] and GHI: 0.91
Starting energy mix analysis...
Renewable Generation Data Columns:
Index(['Year', 'Hydro(TWh)', 'Biofuel(TWh)', 'Solar PV (TWh)',
'Geothermal (TWh)'],
dtype='object')
Non-renewable Generation Data Columns:
Index(['Mode of Generation', 'Contribution (TWh)'], dtype='object')
Renewable Consumption Data Columns:
Index(['Entity', 'Code', 'Year', 'Geo Biomass Other - TWh',
'Solar Generation - TWh', 'Wind Generation - TWh',
'Hydro Generation - TWh'],
dtype='object')
Total Renewable Generation: 93342.04 TWh
Total Non-renewable Generation: 38896.32 TWh
Analyzing renewable energy composition...
Renewable Energy Mix Analysis for 2017: Hydro(TWh): 4197 TWh (71.7%) Biofuel(TWh): 1127 TWh (19.3%) Solar PV (TWh): 444 TWh (7.6%) Geothermal (TWh): 85 TWh (1.5%) Average Annual Growth Rates: Hydro(TWh): 3.2% per year Biofuel(TWh): 23.7% per year Solar PV (TWh): 38.5% per year Geothermal (TWh): 3.2% per year
Starting statistical analysis... Renewable Generation Data Structure: Columns: ['Year', 'Hydro(TWh)', 'Biofuel(TWh)', 'Solar PV (TWh)', 'Geothermal (TWh)'] Sample data: Year Hydro(TWh) Biofuel(TWh) Solar PV (TWh) Geothermal (TWh) 0 1990 2191.67 3.88 0.09 36.42 1 1991 2268.63 4.19 0.10 37.39 2 1992 2267.16 4.63 0.12 39.30 3 1993 2397.67 5.61 0.15 40.23 4 1994 2419.73 7.31 0.17 41.05 Analyzing columns: ['Hydro(TWh)', 'Biofuel(TWh)', 'Solar PV (TWh)', 'Geothermal (TWh)'] Growth Rates Statistics (%):
Hydro(TWh) Biofuel(TWh) Solar PV (TWh) Geothermal (TWh) count 27.00 27.00 27.00 27.00 mean 3.21 23.69 38.47 3.23 std 13.23 8.98 21.22 2.48 min -26.25 7.99 11.11 -2.83 25% 0.55 18.26 23.86 1.88 50% 1.62 23.08 33.33 3.15 75% 4.33 28.90 51.34 4.48 max 44.63 45.63 97.89 8.06 Variance Analysis:
| mean | std | var | cv | |
|---|---|---|---|---|
| Hydro(TWh) | 2974.167500 | 595.936814 | 355140.686634 | 20.037097 |
| Biofuel(TWh) | 245.032500 | 329.275399 | 108422.288160 | 134.380296 |
| Solar PV (TWh) | 57.430000 | 113.343588 | 12846.768985 | 197.359548 |
| Geothermal (TWh) | 57.014286 | 14.850555 | 220.538996 | 26.047078 |
Summary Statistics: Total Generation: 93342.04 TWh Latest Year (2017) Generation Mix: Hydro(TWh): 4197.29 TWh (71.7%) Biofuel(TWh): 1127.31 TWh (19.3%) Solar PV (TWh): 443.55 TWh (7.6%) Geothermal (TWh): 85.34 TWh (1.5%) Compound Annual Growth Rate (CAGR): Hydro(TWh): 2.4% Biofuel(TWh): 23.4% Solar PV (TWh): 37.0% Geothermal (TWh): 3.2%
Key Findings from Data Exploration:
1. Data Quality:
- Minimal missing values in core variables
- No significant data quality issues
- Some outliers present in renewable generation data
2. Temporal Patterns:
- Clear upward trend in renewable energy adoption
- Significant seasonal variations in generation
- Acceleration in growth rates post-2010
3. Geographic Distribution:
- High concentration in developed countries
- Significant regional variations
- Emerging markets showing rapid growth
4. Weather Impact:
- Strong correlation with solar radiation
- Moderate wind speed dependency
- Temperature effects vary by region
5. Energy Mix:
- Increasing share of renewables
- Hydro and wind dominate renewable sources
- Solar showing fastest growth rate
Next Steps:
1. Feature Engineering:
- Create weather-based features
- Calculate growth rates and trends
- Generate regional indicators
2. Preprocessing:
- Handle outliers in generation data
- Normalize weather variables
- Create consistent time series format